---
title: "Adding Risky Choices"
subtitle: "Foundational Report 5"
description: |
Extension of the SEU model to include risky choice data, enabling
identification of utility parameters through known-probability decisions.
categories: [foundations, m_1]
execute:
cache: true
---
```{python}
#| label: setup
#| include: false
import sys
import os
# Add parent directories to path
sys.path.insert(0, os.path.join(os.getcwd(), '..'))
project_root = os.path.dirname(os.path.dirname(os.getcwd()))
sys.path.insert(0, project_root)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)
```
## Introduction
[Report 4](04_parameter_recovery.qmd) revealed differential parameter recovery in model m_0: while the sensitivity parameter α and feature weights β are recovered well, the utility increments δ show wider uncertainty and slower learning as sample size increases. This doesn't mean δ is uninformed by the data—the posteriors do concentrate relative to the prior—but the information accumulates more slowly than for other parameters.
One approach to improving δ recovery is simply to collect more data. However, decision theory suggests an alternative: incorporating **risky decisions** where probabilities are known to the decision-maker. This approach, grounded in the Anscombe-Aumann framework, provides a different type of information that may help constrain utility parameters more directly.
## Historical Background: Risk vs. Uncertainty
The distinction between decision-making under **risk** and under **uncertainty** is one of the oldest and most fundamental in decision theory.
### Knight's Distinction
Frank Knight [-@knight1921risk] drew a sharp distinction between:
- **Risk**: Situations where probabilities of outcomes are objectively known (e.g., a fair die)
- **Uncertainty**: Situations where probabilities are unknown or unknowable (e.g., will it rain tomorrow?)
Knight argued that these represent fundamentally different decision environments, with profit in business arising primarily from uncertainty rather than risk.
### Von Neumann-Morgenstern: Expected Utility Under Risk
The modern theory of rational choice under risk was axiomatized by @vonneumann1944theory. Their key insight was that preferences over **lotteries**—probability distributions over outcomes—could be represented by a utility function if certain axioms (completeness, transitivity, continuity, independence) were satisfied.
For a lottery $L = (p_1, o_1; p_2, o_2; \ldots; p_K, o_K)$ giving outcome $o_k$ with known probability $p_k$, the expected utility is:
$$
U(L) = \sum_{k=1}^K p_k \cdot u(o_k)
$$
The vNM framework applies only to risk—the probabilities $p_k$ are taken as given.
### Savage: Subjective Expected Utility Under Uncertainty
Leonard Savage [-@savage1954foundations] extended expected utility theory to uncertainty by introducing **subjective probabilities**. In his framework, the decision-maker acts as if assigning personal probabilities to states of the world and maximizing expected utility with respect to these beliefs.
Savage's axioms imply the existence of both:
1. A utility function $u$ over consequences
2. A probability measure $P$ over states
However, Savage's approach requires eliciting utilities and probabilities simultaneously from choice behavior—raising the identification problem we encountered with m_0.
### Anscombe-Aumann: The Horse Lottery Resolution
@anscombe1963definition proposed an elegant solution that bridges risk and uncertainty. Their key innovation was to consider choice objects that combine both:
::: {.callout-note}
## The Horse Lottery Framework
An **Anscombe-Aumann act** maps states of the world (the "horses") to lotteries (probability distributions over prizes).
- The probabilities in each lottery are objective (known)
- The probabilities across states are subjective (uncertain)
By varying the lotteries while holding the state-dependence fixed, one can identify the utility function from risky preferences. By varying the state-dependence while holding lotteries fixed, one can identify subjective probabilities.
:::
This insight is precisely what motivates model m_1. By observing choices in both:
1. **Uncertain decisions** (like Savage's acts, mapped through features to subjective probabilities)
2. **Risky decisions** (like vNM lotteries, with known probabilities)
...we can separately identify utilities and subjective probabilities.
### The Identification Logic
As @kreps1988notes explains in his exposition of the Anscombe-Aumann framework:
> "The introduction of objective lotteries... provides a way to calibrate cardinal utility... Once cardinal utility is pinned down by choices among lotteries, subjective probability can be inferred from choices among acts."
This is exactly our strategy:
| Choice Type | What It Reveals | Model Component |
|-------------|-----------------|-----------------|
| **Risky** (known $p$) | Utility function shape | δ parameters |
| **Uncertain** (unknown $p$) | Subjective probability formation | β parameters |
| **Both** | Choice sensitivity | α parameter |
## Model m_1 Specification
Model m_1 extends m_0 by adding N risky choice problems alongside the M uncertain choice problems. Both share the same utility function (υ, determined by δ) and sensitivity parameter (α).
### Data Structure
```{python}
#| label: m1-data-structure
#| echo: true
#| eval: false
# From m_1.stan data block:
# === UNCERTAIN DECISIONS (as in m_0) ===
# M: number of uncertain decision problems
# K: number of possible consequences
# D: dimensions of alternative features
# R: number of distinct uncertain alternatives
# w[R]: feature vectors for uncertain alternatives (vector[D] each)
# I[M,R]: indicator array for which alternatives appear in which problems
# y[M]: choices for uncertain problems
# === RISKY DECISIONS (new in m_1) ===
# N: number of risky decision problems
# S: number of distinct risky alternatives
# x[S]: probability simplexes for risky alternatives (simplex[K] each)
# J[N,S]: indicator array for which risky alternatives appear in which problems
# z[N]: choices for risky problems
```
### Key Differences from m_0
The crucial difference is in how probabilities enter:
**Uncertain alternatives** (same as m_0):
$$
\psi_{rk} = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{w}_r)}{\sum_{k'=1}^K \exp(\boldsymbol{\beta}_{k'}^\top \mathbf{w}_r)}
$$
The probabilities ψ are *derived* from features via the learned mapping β.
**Risky alternatives** (new in m_1):
$$
\pi_{sk} = x_{sk} \quad \text{(given as data)}
$$
The probabilities π are *provided* directly—they are the objective lottery probabilities.
### Expected Utilities
For **uncertain** alternatives, expected utility is:
$$
\eta^{(u)}_r = \sum_{k=1}^K \psi_{rk} \cdot \upsilon_k = \boldsymbol{\psi}_r^\top \boldsymbol{\upsilon}
$$
For **risky** alternatives, expected utility is:
$$
\eta^{(r)}_s = \sum_{k=1}^K \pi_{sk} \cdot \upsilon_k = \boldsymbol{\pi}_s^\top \boldsymbol{\upsilon}
$$
The key insight: risky expected utilities depend *only* on υ (and hence δ), not on β. This breaks the confounding that plagued m_0.
### Choice Probabilities
Both choice types use the same softmax rule with shared α:
$$
\chi^{(u)}_{mi} = \frac{\exp(\alpha \cdot \eta^{(u)}_{mi})}{\sum_{j=1}^{N^{(u)}_m} \exp(\alpha \cdot \eta^{(u)}_{mj})}
\quad\text{and}\quad
\chi^{(r)}_{ni} = \frac{\exp(\alpha \cdot \eta^{(r)}_{ni})}{\sum_{j=1}^{N^{(r)}_n} \exp(\alpha \cdot \eta^{(r)}_{nj})}
$$
### Likelihood
The log-likelihood is the sum of contributions from both choice types:
$$
\log p(y, z | \theta) = \sum_{m=1}^M \log \chi^{(u)}_{m, y_m} + \sum_{n=1}^N \log \chi^{(r)}_{n, z_n}
$$
## Examining the Stan Implementation
Let's examine key portions of `m_1.stan`:
```{python}
#| label: show-m1-stan
#| echo: false
# Display key sections of m_1.stan
m1_code = '''
// === PARAMETERS ===
parameters {
real<lower=0> alpha; // sensitivity (shared)
matrix[K,D] beta; // feature-to-probability mapping
simplex[K-1] delta; // utility increments (shared)
}
// === TRANSFORMED PARAMETERS ===
transformed parameters {
// Shared utility function
ordered[K] upsilon = cumulative_sum(append_row(0, delta));
// UNCERTAIN: subjective probabilities via softmax
for (i in 1:total_uncertain_alts) {
psi[i] = softmax(beta * x_uncertain[i]);
}
// UNCERTAIN: expected utilities
for (i in 1:total_uncertain_alts) {
eta_uncertain[i] = dot_product(psi[i], upsilon);
}
// RISKY: expected utilities (no beta involved!)
for (i in 1:total_risky_alts) {
eta_risky[i] = dot_product(x_risky[i], upsilon); // x_risky is known
}
// Choice probabilities via softmax with shared alpha
// ... (for both uncertain and risky problems)
}
// === MODEL ===
model {
// Priors (same as m_0)
alpha ~ lognormal(0, 1);
to_vector(beta) ~ std_normal();
delta ~ dirichlet(rep_vector(1, K-1));
// Likelihood: uncertain choices
for (m in 1:M) {
y[m] ~ categorical(chi_uncertain[m]);
}
// Likelihood: risky choices
for (n in 1:N) {
z[n] ~ categorical(chi_risky[n]);
}
}
'''
print(m1_code)
```
The critical line is `eta_risky[i] = dot_product(x_risky[i], upsilon)`. Unlike uncertain alternatives where β mediates between features and probabilities, risky alternatives have their probabilities given directly. This means risky choices provide *direct* information about the utility vector υ.
## Why Risky Choices Identify δ
Consider a concrete example. Suppose we have K=3 consequences with utilities $\upsilon = (0, \upsilon_2, 1)$ where $\upsilon_2 = \delta_1$ (since $\delta_1 + \delta_2 = 1$ and utilities are constructed cumulatively).
**Risky choice**: Choose between:
- Lottery A: (0.5, 0, 0.5) → outcomes 1 or 3 with equal probability
- Lottery B: (0, 1, 0) → outcome 2 with certainty
Expected utilities:
$$
\eta_A = 0.5 \cdot 0 + 0 \cdot \upsilon_2 + 0.5 \cdot 1 = 0.5
$$
$$
\eta_B = 0 \cdot 0 + 1 \cdot \upsilon_2 + 0 \cdot 1 = \upsilon_2
$$
The decision-maker prefers the alternative with higher expected utility:
- If they choose A, then $\eta_A > \eta_B$, implying $0.5 > \upsilon_2$
- If they choose B, then $\eta_B > \eta_A$, implying $\upsilon_2 > 0.5$
This preference *directly* constrains υ₂ (and hence δ₁) without any confounding from subjective probabilities! More generally, by presenting lotteries that span the consequence space, we can triangulate the utility function with arbitrary precision (given sufficient data and lottery diversity).
## Study Design for m_1
To demonstrate that adding risky choices resolves the δ identification problem, we need a controlled comparison between m_0 and m_1. Crucially, the **uncertain decision problems must be identical** between the two models—any differences in recovery should be attributable solely to the addition of risky choices.
We first create a base m_0 study design (matching Report 4's configuration), then use the `from_base_study()` method to create an m_1 design that inherits the same uncertain alternatives and problems:
```{python}
#| label: m1-study-config
#| echo: true
from utils.study_design import StudyDesign
# Configuration for both models (uncertain component)
config_base = {
"M": 25, # Number of uncertain decision problems
"K": 3, # Number of consequences
"D": 5, # Feature dimensions
"R": 15, # Distinct uncertain alternatives
"min_alts_per_problem": 2,
"max_alts_per_problem": 5,
"feature_dist": "normal",
"feature_params": {"loc": 0, "scale": 1},
}
# Risky problems configuration (added for m_1)
config_risky = {
"N": 25, # Number of risky decision problems
"S": 15, # Distinct risky alternatives
}
# Create the base m_0 study design
study_base = StudyDesign(
M=config_base["M"],
K=config_base["K"],
D=config_base["D"],
R=config_base["R"],
min_alts_per_problem=config_base["min_alts_per_problem"],
max_alts_per_problem=config_base["max_alts_per_problem"],
feature_dist=config_base["feature_dist"],
feature_params=config_base["feature_params"],
design_name="m0_base_for_comparison"
)
study_base.generate()
print(f"Base Study Design (for m_0 and m_1 uncertain component):")
print(f" M = {study_base.M} uncertain decision problems")
print(f" R = {study_base.R} distinct alternatives")
print(f" K = {study_base.K} consequences")
print(f"\nRisky Problems (added for m_1):")
print(f" N = {config_risky['N']} risky decision problems")
print(f" S = {config_risky['S']} distinct lotteries")
```
## Parameter Recovery Comparison
In [Report 4](04_parameter_recovery.qmd), we found that δ recovery was weaker than α and β recovery in model m_0, with credible intervals that narrowed more slowly as sample size increased. We hypothesized that adding risky choices might help constrain utility parameters more directly.
Now we test this hypothesis. If the Anscombe-Aumann logic holds, risky choices should provide more direct information about the utility function, potentially improving δ recovery.
### Study Design for m_1
We create the m_1 study design using `from_base_study()`, which ensures that the uncertain component (alternatives and problems) is **identical** to the base m_0 study. Only the risky problems are newly generated. This gives us 50 total decision problems—matching the sample size used in the M=50 analysis of m_0 (Report 4):
```{python}
#| label: setup-m1-recovery
#| output: false
from utils.study_design_m1 import StudyDesignM1
from analysis.parameter_recovery import ParameterRecovery
import tempfile
# Create the m_1 study design FROM the base study
# This ensures identical uncertain problems for valid comparison
study_m1 = StudyDesignM1.from_base_study(
base_study=study_base,
N=config_risky["N"], # 25 risky problems
S=config_risky["S"], # 15 risky alternatives
risky_probs="random", # Random simplex probabilities (diverse lotteries)
design_name="m1_parameter_recovery"
)
```
```{python}
#| label: design-m1-summary
#| echo: false
# Summarize the design
print(f"m_1 Study Design Generated:")
print(f"\n Uncertain Problems (inherited from base study):")
print(f" M = {study_m1.M} problems")
print(f" R = {study_m1.R} distinct alternatives")
print(f" Total uncertain choices: ~{np.sum(study_m1.I)}")
print(f"\n Risky Problems (newly generated):")
print(f" N = {study_m1.N} problems")
print(f" S = {study_m1.S} distinct lotteries")
print(f" Total risky choices: ~{np.sum(study_m1.J)}")
print(f"\n Shared:")
print(f" K = {study_m1.K} consequences")
print(f" Total choices: ~{np.sum(study_m1.I) + np.sum(study_m1.J)}")
```
### Running Parameter Recovery for m_1
```{python}
#| label: run-m1-recovery
#| output: false
# Create output directory
output_dir_m1 = tempfile.mkdtemp(prefix="param_recovery_m1_")
# Initialize and run parameter recovery for m_1
recovery_m1 = ParameterRecovery(
inference_model_path=os.path.join(project_root, "models", "m_1.stan"),
sim_model_path=os.path.join(project_root, "models", "m_1_sim.stan"),
study_design=study_m1,
output_dir=output_dir_m1,
n_mcmc_samples=1000,
n_mcmc_chains=4,
n_iterations=50
)
true_params_m1, posterior_summaries_m1 = recovery_m1.run()
```
```{python}
#| label: m1-recovery-summary
#| echo: false
n_successful_m1 = len(true_params_m1)
print(f"\nm_1 Parameter Recovery Complete:")
print(f" Successful iterations: {n_successful_m1}")
```
### Comparing m_0 and m_1 Recovery
Now we compare the key parameter recovery metrics between models. We focus particularly on δ, the parameter that was poorly identified in m_0:
```{python}
#| label: compute-m1-metrics
#| output: false
# Compute recovery metrics for m_1
# Alpha
alpha_true_m1 = np.array([p['alpha'] for p in true_params_m1])
alpha_mean_m1 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m1])
alpha_lower_m1 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m1])
alpha_upper_m1 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m1])
alpha_bias_m1 = np.mean(alpha_mean_m1 - alpha_true_m1)
alpha_rmse_m1 = np.sqrt(np.mean((alpha_mean_m1 - alpha_true_m1)**2))
alpha_coverage_m1 = np.mean((alpha_true_m1 >= alpha_lower_m1) & (alpha_true_m1 <= alpha_upper_m1))
alpha_ci_width_m1 = np.mean(alpha_upper_m1 - alpha_lower_m1)
# Delta (using config from base study)
K_minus_1 = config_base['K'] - 1
delta_stats_m1 = []
for k in range(K_minus_1):
param_name = f"delta[{k+1}]"
delta_true = np.array([p['delta'][k] for p in true_params_m1])
delta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m1])
delta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m1])
delta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m1])
delta_stats_m1.append({
'parameter': f'δ_{k+1}',
'true': delta_true,
'mean': delta_mean,
'lower': delta_lower,
'upper': delta_upper,
'bias': np.mean(delta_mean - delta_true),
'rmse': np.sqrt(np.mean((delta_mean - delta_true)**2)),
'coverage': np.mean((delta_true >= delta_lower) & (delta_true <= delta_upper)),
'ci_width': np.mean(delta_upper - delta_lower)
})
delta_df_m1 = pd.DataFrame(delta_stats_m1)
# Beta (for comparison)
K, D = config_base['K'], config_base['D']
beta_stats_m1 = []
for k in range(K):
for d in range(D):
param_name = f"beta[{k+1},{d+1}]"
beta_true = np.array([p['beta'][k][d] for p in true_params_m1])
beta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m1])
beta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m1])
beta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m1])
beta_stats_m1.append({
'rmse': np.sqrt(np.mean((beta_mean - beta_true)**2)),
'coverage': np.mean((beta_true >= beta_lower) & (beta_true <= beta_upper)),
'ci_width': np.mean(beta_upper - beta_lower)
})
beta_df_m1 = pd.DataFrame(beta_stats_m1)
```
```{python}
#| label: fig-m0-m1-delta-comparison
#| fig-cap: "Comparison of δ recovery between m_0 and m_1. Adding risky choices dramatically improves recovery: narrower credible intervals and better coverage."
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
for k in range(K_minus_1):
# m_1 True vs Estimated (top row)
ax = axes[0, k]
ax.scatter(delta_stats_m1[k]['true'], delta_stats_m1[k]['mean'],
alpha=0.7, s=60, c='forestgreen', edgecolor='white', label='m_1')
ax.plot([0, 1], [0, 1], 'r--', linewidth=2, label='Identity')
ax.set_xlim(-0.05, 1.05)
ax.set_ylim(-0.05, 1.05)
ax.set_xlabel(f'True δ_{k+1}', fontsize=11)
ax.set_ylabel(f'Estimated δ_{k+1}', fontsize=11)
ax.set_title(f'm_1: δ_{k+1} Recovery (RMSE={delta_stats_m1[k]["rmse"]:.3f})', fontsize=11)
ax.set_aspect('equal')
ax.legend()
ax.grid(True, alpha=0.3)
# m_1 Coverage (bottom row)
ax = axes[1, k]
iterations = np.arange(len(delta_stats_m1[k]['true']))
for i in range(len(delta_stats_m1[k]['true'])):
covered = ((delta_stats_m1[k]['true'][i] >= delta_stats_m1[k]['lower'][i]) &
(delta_stats_m1[k]['true'][i] <= delta_stats_m1[k]['upper'][i]))
color = 'forestgreen' if covered else 'crimson'
ax.plot([i, i], [delta_stats_m1[k]['lower'][i], delta_stats_m1[k]['upper'][i]],
color=color, linewidth=2, alpha=0.7)
ax.scatter(i, delta_stats_m1[k]['mean'][i], color=color, s=40, zorder=3)
ax.scatter(iterations, delta_stats_m1[k]['true'], color='black', s=60, marker='x',
label='True value', zorder=4, linewidth=2)
ax.set_xlabel('Iteration', fontsize=11)
ax.set_ylabel(f'δ_{k+1}', fontsize=11)
ax.set_title(f'm_1: δ_{k+1} Coverage = {delta_stats_m1[k]["coverage"]:.0%}', fontsize=11)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
```
```{python}
#| label: fig-delta-ci-width-comparison
#| fig-cap: "Credible interval widths for δ parameters across all conditions. This comparison isolates the effect of risky vs. uncertain alternatives."
#| output: false
# Run m_0 recovery on the SAME base study used for m_1's uncertain component
# This ensures a valid comparison: identical uncertain problems, differing only
# in whether risky choices are added
output_dir_m0 = tempfile.mkdtemp(prefix="param_recovery_m0_compare_")
recovery_m0_compare = ParameterRecovery(
inference_model_path=None, # Default m_0.stan
sim_model_path=None, # Default m_0_sim.stan
study_design=study_base, # Use the same base study as m_1 (M=25)
output_dir=output_dir_m0,
n_mcmc_samples=1000,
n_mcmc_chains=4,
n_iterations=50
)
true_params_m0, posterior_summaries_m0 = recovery_m0_compare.run()
# Run m_0 with M=50 using SAME alternatives (only new problems)
# This uses the SAME alternatives as study_base, just with 25 additional problems
study_m50 = study_base.extend(additional_M=25, design_name="m0_extended_m50")
output_dir_m0_m50 = tempfile.mkdtemp(prefix="param_recovery_m0_m50_")
recovery_m0_m50 = ParameterRecovery(
inference_model_path=None,
sim_model_path=None,
study_design=study_m50,
output_dir=output_dir_m0_m50,
n_mcmc_samples=1000,
n_mcmc_chains=4,
n_iterations=50
)
true_params_m0_m50, posterior_summaries_m0_m50 = recovery_m0_m50.run()
# Run m_0 with M=50 using NEW alternatives (15 new feature vectors)
# This parallels how m_1 adds 15 new risky alternatives
study_m50_new_alts = study_base.extend_with_alternatives(
additional_M=25,
additional_R=15, # Same as S in m_1
design_name="m0_extended_m50_new_alts",
new_problems_use_new_alts_only=True # New problems only use new alternatives
)
output_dir_m0_m50_new = tempfile.mkdtemp(prefix="param_recovery_m0_m50_new_alts_")
recovery_m0_m50_new = ParameterRecovery(
inference_model_path=None,
sim_model_path=None,
study_design=study_m50_new_alts,
output_dir=output_dir_m0_m50_new,
n_mcmc_samples=1000,
n_mcmc_chains=4,
n_iterations=50
)
true_params_m0_m50_new, posterior_summaries_m0_m50_new = recovery_m0_m50_new.run()
# Compute m_0 delta metrics for all three m_0 conditions
delta_stats_m0 = []
delta_stats_m0_m50 = []
delta_stats_m0_m50_new = []
for k in range(K_minus_1):
param_name = f"delta[{k+1}]"
# M=25 (base)
delta_true = np.array([p['delta'][k] for p in true_params_m0])
delta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m0])
delta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m0])
delta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m0])
delta_stats_m0.append({
'parameter': f'δ_{k+1}',
'rmse': np.sqrt(np.mean((delta_mean - delta_true)**2)),
'coverage': np.mean((delta_true >= delta_lower) & (delta_true <= delta_upper)),
'ci_width': np.mean(delta_upper - delta_lower)
})
# M=50 (same alternatives)
delta_true_m50 = np.array([p['delta'][k] for p in true_params_m0_m50])
delta_mean_m50 = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m0_m50])
delta_lower_m50 = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m0_m50])
delta_upper_m50 = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m0_m50])
delta_stats_m0_m50.append({
'parameter': f'δ_{k+1}',
'rmse': np.sqrt(np.mean((delta_mean_m50 - delta_true_m50)**2)),
'coverage': np.mean((delta_true_m50 >= delta_lower_m50) & (delta_true_m50 <= delta_upper_m50)),
'ci_width': np.mean(delta_upper_m50 - delta_lower_m50)
})
# M=50 (new alternatives)
delta_true_m50_new = np.array([p['delta'][k] for p in true_params_m0_m50_new])
delta_mean_m50_new = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m0_m50_new])
delta_lower_m50_new = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m0_m50_new])
delta_upper_m50_new = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m0_m50_new])
delta_stats_m0_m50_new.append({
'parameter': f'δ_{k+1}',
'rmse': np.sqrt(np.mean((delta_mean_m50_new - delta_true_m50_new)**2)),
'coverage': np.mean((delta_true_m50_new >= delta_lower_m50_new) & (delta_true_m50_new <= delta_upper_m50_new)),
'ci_width': np.mean(delta_upper_m50_new - delta_lower_m50_new)
})
# Also compute alpha metrics for all m_0 conditions
alpha_true_m0_m50 = np.array([p['alpha'] for p in true_params_m0_m50])
alpha_mean_m0_m50 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m0_m50])
alpha_lower_m0_m50 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m0_m50])
alpha_upper_m0_m50 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m0_m50])
alpha_rmse_m0_m50 = np.sqrt(np.mean((alpha_mean_m0_m50 - alpha_true_m0_m50)**2))
alpha_coverage_m0_m50 = np.mean((alpha_true_m0_m50 >= alpha_lower_m0_m50) & (alpha_true_m0_m50 <= alpha_upper_m0_m50))
alpha_ci_width_m0_m50 = np.mean(alpha_upper_m0_m50 - alpha_lower_m0_m50)
alpha_true_m0_m50_new = np.array([p['alpha'] for p in true_params_m0_m50_new])
alpha_mean_m0_m50_new = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m0_m50_new])
alpha_lower_m0_m50_new = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m0_m50_new])
alpha_upper_m0_m50_new = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m0_m50_new])
alpha_rmse_m0_m50_new = np.sqrt(np.mean((alpha_mean_m0_m50_new - alpha_true_m0_m50_new)**2))
alpha_coverage_m0_m50_new = np.mean((alpha_true_m0_m50_new >= alpha_lower_m0_m50_new) & (alpha_true_m0_m50_new <= alpha_upper_m0_m50_new))
alpha_ci_width_m0_m50_new = np.mean(alpha_upper_m0_m50_new - alpha_lower_m0_m50_new)
```
```{python}
#| label: fig-model-comparison-bars
#| fig-cap: "Comparison of δ recovery across four conditions: m_0 (M=25), m_0 (M=50 same alts), m_0 (M=50 new alts), and m_1 (M=25 + N=25 risky). The key comparison is between m_0 with new uncertain alternatives vs. m_1 with new risky alternatives."
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
params = ['δ₁', 'δ₂']
x = np.arange(len(params))
width = 0.2
# RMSE comparison
ax = axes[0]
rmse_m0 = [delta_stats_m0[k]['rmse'] for k in range(K_minus_1)]
rmse_m0_m50 = [delta_stats_m0_m50[k]['rmse'] for k in range(K_minus_1)]
rmse_m0_m50_new = [delta_stats_m0_m50_new[k]['rmse'] for k in range(K_minus_1)]
rmse_m1 = [delta_stats_m1[k]['rmse'] for k in range(K_minus_1)]
bars1 = ax.bar(x - 1.5*width, rmse_m0, width, label='m_0 (M=25)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x - 0.5*width, rmse_m0_m50, width, label='m_0 (M=50 same)', color='coral', alpha=0.7)
bars3 = ax.bar(x + 0.5*width, rmse_m0_m50_new, width, label='m_0 (M=50 new)', color='mediumseagreen', alpha=0.7)
bars4 = ax.bar(x + 1.5*width, rmse_m1, width, label='m_1 (M=25+N=25)', color='purple', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('δ RMSE by Condition', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3, axis='y')
# Coverage comparison
ax = axes[1]
cov_m0 = [delta_stats_m0[k]['coverage'] for k in range(K_minus_1)]
cov_m0_m50 = [delta_stats_m0_m50[k]['coverage'] for k in range(K_minus_1)]
cov_m0_m50_new = [delta_stats_m0_m50_new[k]['coverage'] for k in range(K_minus_1)]
cov_m1 = [delta_stats_m1[k]['coverage'] for k in range(K_minus_1)]
bars1 = ax.bar(x - 1.5*width, cov_m0, width, label='m_0 (M=25)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x - 0.5*width, cov_m0_m50, width, label='m_0 (M=50 same)', color='coral', alpha=0.7)
bars3 = ax.bar(x + 0.5*width, cov_m0_m50_new, width, label='m_0 (M=50 new)', color='mediumseagreen', alpha=0.7)
bars4 = ax.bar(x + 1.5*width, cov_m1, width, label='m_1 (M=25+N=25)', color='purple', alpha=0.7)
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('δ 90% CI Coverage by Condition', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.set_ylim(0, 1.05)
ax.legend(loc='lower right', fontsize=9)
ax.grid(True, alpha=0.3, axis='y')
# CI Width comparison
ax = axes[2]
ci_m0 = [delta_stats_m0[k]['ci_width'] for k in range(K_minus_1)]
ci_m0_m50 = [delta_stats_m0_m50[k]['ci_width'] for k in range(K_minus_1)]
ci_m0_m50_new = [delta_stats_m0_m50_new[k]['ci_width'] for k in range(K_minus_1)]
ci_m1 = [delta_stats_m1[k]['ci_width'] for k in range(K_minus_1)]
bars1 = ax.bar(x - 1.5*width, ci_m0, width, label='m_0 (M=25)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x - 0.5*width, ci_m0_m50, width, label='m_0 (M=50 same)', color='coral', alpha=0.7)
bars3 = ax.bar(x + 0.5*width, ci_m0_m50_new, width, label='m_0 (M=50 new)', color='mediumseagreen', alpha=0.7)
bars4 = ax.bar(x + 1.5*width, ci_m1, width, label='m_1 (M=25+N=25)', color='purple', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('CI Width', fontsize=12)
ax.set_title('δ 90% CI Width by Condition', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
```
### Summary Statistics
```{python}
#| label: tbl-model-comparison
#| tbl-cap: "Parameter recovery comparison across four conditions. The key comparison is m_0 with new uncertain alternatives vs. m_1 with new risky alternatives."
# Build comparison table with all four conditions
comparison_rows = []
# Compute alpha metrics for m_0 M=25
alpha_true_m0 = np.array([p['alpha'] for p in true_params_m0])
alpha_mean_m0 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m0])
alpha_lower_m0 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m0])
alpha_upper_m0 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m0])
alpha_rmse_m0 = np.sqrt(np.mean((alpha_mean_m0 - alpha_true_m0)**2))
alpha_coverage_m0 = np.mean((alpha_true_m0 >= alpha_lower_m0) & (alpha_true_m0 <= alpha_upper_m0))
alpha_ci_width_m0 = np.mean(alpha_upper_m0 - alpha_lower_m0)
# Alpha row
comparison_rows.append({
'Parameter': 'α',
'm_0 (M=25)': f'{alpha_rmse_m0:.4f}',
'm_0 (M=50 same)': f'{alpha_rmse_m0_m50:.4f}',
'm_0 (M=50 new)': f'{alpha_rmse_m0_m50_new:.4f}',
'm_1': f'{alpha_rmse_m1:.4f}',
})
# Delta rows
for k in range(K_minus_1):
comparison_rows.append({
'Parameter': f'δ_{k+1}',
'm_0 (M=25)': f'{delta_stats_m0[k]["rmse"]:.4f}',
'm_0 (M=50 same)': f'{delta_stats_m0_m50[k]["rmse"]:.4f}',
'm_0 (M=50 new)': f'{delta_stats_m0_m50_new[k]["rmse"]:.4f}',
'm_1': f'{delta_stats_m1[k]["rmse"]:.4f}',
})
comparison_df = pd.DataFrame(comparison_rows)
print("RMSE Comparison:")
print(comparison_df.to_string(index=False))
```
```{python}
#| label: tbl-ci-widths
#| tbl-cap: "90% credible interval widths across all four conditions."
# Build CI width comparison table
ci_rows = []
ci_rows.append({
'Parameter': 'α',
'm_0 (M=25)': f'{alpha_ci_width_m0:.3f}',
'm_0 (M=50 same)': f'{alpha_ci_width_m0_m50:.3f}',
'm_0 (M=50 new)': f'{alpha_ci_width_m0_m50_new:.3f}',
'm_1 (M=25+N=25)': f'{alpha_ci_width_m1:.3f}'
})
for k in range(K_minus_1):
ci_rows.append({
'Parameter': f'δ_{k+1}',
'm_0 (M=25)': f'{delta_stats_m0[k]["ci_width"]:.3f}',
'm_0 (M=50 same)': f'{delta_stats_m0_m50[k]["ci_width"]:.3f}',
'm_0 (M=50 new)': f'{delta_stats_m0_m50_new[k]["ci_width"]:.3f}',
'm_1 (M=25+N=25)': f'{delta_stats_m1[k]["ci_width"]:.3f}'
})
ci_df = pd.DataFrame(ci_rows)
print(ci_df.to_string(index=False))
```
::: {.callout-tip}
## Key Result: Comparing Uncertain vs. Risky Alternative Extensions
The four-way comparison now isolates what drives δ recovery improvement:
| Condition | Description | What it tests |
|-----------|-------------|---------------|
| m_0 (M=25) | Base study | Baseline |
| m_0 (M=50 same) | +25 problems, same R=15 alternatives | More data, same feature space |
| m_0 (M=50 new) | +25 problems using 15 **new** alternatives | Expanded feature space |
| m_1 (M=25+N=25) | +25 risky problems using 15 **new** lotteries | Known probabilities |
**The crucial comparison** is between conditions 3 and 4:
- Both add 25 new problems
- Both introduce 15 new choice objects (uncertain alternatives vs. risky lotteries)
- The key difference: risky alternatives have *known* probabilities, uncertain alternatives have *inferred* probabilities
If m_1 outperforms m_0 (M=50 new), it demonstrates that the **type** of data matters—not just the amount or diversity. Risky choices provide qualitatively different information about utilities because they bypass the probability inference step entirely.
:::
### Why This Helps
The improvement from adding risky choices can be understood through the likelihood structure:
**In m_0** (uncertain choices only):
$$
\log p(y | \theta) = \sum_{m=1}^M \log \chi_{m, y_m} \quad \text{where} \quad \chi \propto \exp(\alpha \cdot \boldsymbol{\psi}^\top \boldsymbol{\upsilon})
$$
Here, ψ (from β) and υ (from δ) appear together in expected utilities. While not completely confounded, information about δ must be extracted indirectly.
**In m_1** (uncertain + risky choices):
$$
\log p(y, z | \theta) = \underbrace{\sum_{m=1}^M \log \chi^{(u)}_{m, y_m}}_{\text{Informs } \alpha, \beta, \delta} + \underbrace{\sum_{n=1}^N \log \chi^{(r)}_{n, z_n}}_{\text{More directly informs } \delta}
$$
The risky choice likelihood depends on δ through υ = cumsum(δ), but **not** on β. This provides a more direct channel for learning about utilities, supplementing the indirect information from uncertain choices.
## Summary
The extension from m_0 to m_1 follows a principled path grounded in decision theory:
1. **Differential recovery in m_0** ([Report 4](04_parameter_recovery.qmd)): Utility parameters show wider uncertainty than sensitivity parameters, though they do respond to data
2. **Three extensions compared**: More problems (same alternatives), more problems (new alternatives), risky problems (new lotteries)
3. **The historical insight**: Anscombe-Aumann recognized that combining risk and uncertainty provides complementary information about preferences
4. **The key comparison**: m_0 with new uncertain alternatives vs. m_1 with new risky alternatives isolates whether known probabilities provide unique advantages
::: {.callout-tip}
## Key Insight
The four-way comparison reveals whether the benefit of risky choices comes from:
- Simply having more diverse choice problems (testable via m_0 with new alternatives), or
- The qualitatively different information provided by known probabilities (unique to m_1)
If m_1 outperforms the expanded m_0, it validates the Anscombe-Aumann insight: risky choices provide a more direct route to utility identification because they bypass the subjective probability inference step.
The choice between approaches depends on practical considerations: the feasibility of including risky choice tasks, the importance of precise utility estimation, and whether the research question requires separating utilities from beliefs.
:::
## Formal Identification Analysis
The empirical results above demonstrate that m_1 achieves better parameter recovery than m_0, particularly for the utility parameters δ. In this section, we provide a formal analysis of why this occurs by establishing identification results for both models.
### Notation and Setup
Let the full parameter vector be $\theta = (\alpha, \beta, \delta)$ where:
- $\alpha > 0$ is the sensitivity parameter
- $\beta \in \mathbb{R}^{K \times D}$ is the feature-to-probability mapping matrix
- $\delta \in \Delta^{K-2}$ is the utility increment simplex, i.e., $\delta = (\delta_1, \ldots, \delta_{K-1})$ with $\delta_k > 0$ and $\sum_{k=1}^{K-1} \delta_k = 1$
The utility vector $\upsilon \in \mathbb{R}^K$ is constructed as:
$$
\upsilon_1 = 0, \quad \upsilon_k = \sum_{j=1}^{k-1} \delta_j \text{ for } k = 2, \ldots, K
$$
This ensures $\upsilon$ is ordered with $0 = \upsilon_1 < \upsilon_2 < \cdots < \upsilon_K = 1$.
For an uncertain alternative with feature vector $\mathbf{w} \in \mathbb{R}^D$, the subjective probability vector is:
$$
\psi_k(\beta, \mathbf{w}) = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{w})}{\sum_{j=1}^K \exp(\boldsymbol{\beta}_j^\top \mathbf{w})}
$$
For a risky alternative with probability vector $\boldsymbol{\pi} \in \Delta^{K-1}$, the probabilities are simply $\boldsymbol{\pi}$ (given as data).
Expected utilities are:
$$
\eta^{(u)}(\theta, \mathbf{w}) = \boldsymbol{\psi}(\beta, \mathbf{w})^\top \boldsymbol{\upsilon}(\delta), \quad \eta^{(r)}(\delta, \boldsymbol{\pi}) = \boldsymbol{\pi}^\top \boldsymbol{\upsilon}(\delta)
$$
Choice probabilities in a problem with alternatives yielding expected utilities $\eta_1, \ldots, \eta_n$ are:
$$
\chi_i(\alpha, \eta) = \frac{\exp(\alpha \cdot \eta_i)}{\sum_{j=1}^n \exp(\alpha \cdot \eta_j)}
$$
### Definition of Identifiability
::: {.callout-note}
## Global Identifiability
Parameters $\theta$ are **globally identifiable** from a class of decision problems $\mathcal{P}$ if for any $\theta \neq \theta'$, there exists at least one problem $P \in \mathcal{P}$ such that the induced choice probability distributions differ:
$$
\chi^P(\theta) \neq \chi^P(\theta')
$$
:::
This definition captures whether, in principle, sufficiently rich data can distinguish any two parameter configurations. It is a necessary condition for consistent estimation.
### Identification in Model m_1
We first establish that m_1 achieves global identification under mild conditions.
::: {.callout-important}
## Theorem (Global Identifiability of m_1)
Consider model m_1 with $K \geq 2$ consequences and $D \geq 1$ feature dimensions. Let the class of decision problems include:
1. **Risky problems** with probability vectors from a set $\mathcal{X} \subseteq \Delta^{K-1}$ that contains $K$ affinely independent probability vectors
2. **Uncertain problems** with feature vectors from a set $\mathcal{W} \subseteq \mathbb{R}^D$ that spans $\mathbb{R}^D$
Then for any $\theta = (\alpha, \beta, \delta) \neq \theta' = (\alpha', \beta', \delta')$ with $\alpha, \alpha' > 0$, there exists a **binary choice problem** (with exactly two alternatives) where the choice probabilities under $\theta$ and $\theta'$ differ.
:::
Before proving the theorem, we establish a key lemma.
::: {.callout-note}
## Lemma (Affine Independence Implies Linear Independence for Probability Vectors)
Let $\boldsymbol{\pi}^{(1)}, \ldots, \boldsymbol{\pi}^{(K)} \in \Delta^{K-1}$ be $K$ affinely independent probability vectors. Then they are linearly independent in $\mathbb{R}^K$.
:::
**Proof of Lemma.** Recall that vectors $\boldsymbol{\pi}^{(1)}, \ldots, \boldsymbol{\pi}^{(K)}$ are **affinely dependent** if there exist scalars $c_1, \ldots, c_K$, *not all zero*, such that $\sum_s c_s = 0$ and $\sum_s c_s \boldsymbol{\pi}^{(s)} = \mathbf{0}$. Equivalently, they are affinely independent if no such non-trivial relation exists.
Now suppose $\sum_{s=1}^K c_s \boldsymbol{\pi}^{(s)} = \mathbf{0}$ for some scalars $c_1, \ldots, c_K$ (a linear dependence relation). Taking the inner product with $\mathbf{1} = (1, \ldots, 1)^\top$ and using the fact that $\mathbf{1}^\top \boldsymbol{\pi}^{(s)} = 1$ for all probability vectors, we obtain:
$$
0 = \mathbf{1}^\top \mathbf{0} = \mathbf{1}^\top \left( \sum_{s=1}^K c_s \boldsymbol{\pi}^{(s)} \right) = \sum_{s=1}^K c_s (\mathbf{1}^\top \boldsymbol{\pi}^{(s)}) = \sum_{s=1}^K c_s
$$
Thus any linear dependence among probability vectors automatically satisfies $\sum_s c_s = 0$. This means any linear dependence would also constitute an affine dependence. Since our vectors are affinely independent by assumption, no non-trivial linear dependence can exist. Therefore $c_s = 0$ for all $s$, establishing linear independence. $\square$
**Proof of Theorem.** We proceed by cases, showing that any difference in parameters can be detected by a binary choice problem.
*Case 1: $\delta \neq \delta'$ (utilities differ).*
By assumption, $\mathcal{X}$ contains $K$ affinely independent probability vectors $\boldsymbol{\pi}^{(1)}, \ldots, \boldsymbol{\pi}^{(K)}$. By the Lemma, these are linearly independent in $\mathbb{R}^K$.
Let $\Pi$ be the $K \times K$ matrix with rows $(\boldsymbol{\pi}^{(s)})^\top$. Since these rows are linearly independent, $\Pi$ is invertible. Consider the expected utilities under both parameter configurations:
$$
\eta^{(r)}_s = (\boldsymbol{\pi}^{(s)})^\top \boldsymbol{\upsilon}(\delta), \quad (\eta')^{(r)}_s = (\boldsymbol{\pi}^{(s)})^\top \boldsymbol{\upsilon}(\delta')
$$
In vector form: $\boldsymbol{\eta} = \Pi \boldsymbol{\upsilon}(\delta)$ and $\boldsymbol{\eta}' = \Pi \boldsymbol{\upsilon}(\delta')$. Since $\Pi$ is invertible:
$$
\boldsymbol{\eta} = \boldsymbol{\eta}' \iff \boldsymbol{\upsilon}(\delta) = \boldsymbol{\upsilon}(\delta')
$$
Now, $\delta \neq \delta'$ implies $\boldsymbol{\upsilon}(\delta) \neq \boldsymbol{\upsilon}(\delta')$ because the map $\delta \mapsto \boldsymbol{\upsilon}(\delta) = \text{cumsum}([0, \delta])$ is injective on $\Delta^{K-2}$. Therefore, $\boldsymbol{\eta} \neq \boldsymbol{\eta}'$, meaning at least one pair of expected utilities differs: $\eta^{(r)}_s \neq (\eta')^{(r)}_s$ for some $s$.
Construct a risky choice problem with alternatives $\boldsymbol{\pi}^{(s)}$ and $\boldsymbol{\pi}^{(t)}$ where the expected utility differences satisfy:
$$
\eta^{(r)}_s - \eta^{(r)}_t \neq (\eta')^{(r)}_s - (\eta')^{(r)}_t
$$
(This is achievable since $\boldsymbol{\eta} \neq \boldsymbol{\eta}'$.) The choice probability is:
$$
\chi_1 = \frac{\exp(\alpha \cdot \eta^{(r)}_s)}{\exp(\alpha \cdot \eta^{(r)}_s) + \exp(\alpha \cdot \eta^{(r)}_t)} = \sigma\bigl(\alpha(\eta^{(r)}_s - \eta^{(r)}_t)\bigr)
$$
where $\sigma$ is the logistic function. Since $\sigma$ is strictly monotonic and the utility differences differ, we have $\chi_1(\theta) \neq \chi_1(\theta')$ (regardless of whether $\alpha = \alpha'$ or not).
*Case 2: $\delta = \delta'$ but $\alpha \neq \alpha'$.*
Consider any risky choice problem with two alternatives having distinct expected utilities $\eta_1 \neq \eta_2$. The choice probability for alternative 1 is:
$$
\chi_1 = \sigma\bigl(\alpha(\eta_1 - \eta_2)\bigr)
$$
This is a strictly monotonic function of $\alpha$ when $\eta_1 \neq \eta_2$, so $\chi_1(\alpha) \neq \chi_1(\alpha')$.
*Case 3: $\delta = \delta'$, $\alpha = \alpha'$, but $\beta \neq \beta'$.*
Since $\beta \neq \beta'$ as $K \times D$ matrices, there exist indices $(k, d)$ such that $\beta_{kd} \neq \beta'_{kd}$. Since $\mathcal{W}$ spans $\mathbb{R}^D$, we can find $\mathbf{w} \in \mathcal{W}$ such that the difference $(\beta - \beta')\mathbf{w}$ is not a constant vector (i.e., not a scalar multiple of $\mathbf{1}$).
Consider the subjective probability vectors $\boldsymbol{\psi}(\beta, \mathbf{w}) = \text{softmax}(\beta \mathbf{w})$ and $\boldsymbol{\psi}(\beta', \mathbf{w}) = \text{softmax}(\beta' \mathbf{w})$. The softmax function has the property that $\text{softmax}(\mathbf{v}) = \text{softmax}(\mathbf{v}')$ if and only if $\mathbf{v} - \mathbf{v}' = c \cdot \mathbf{1}$ for some scalar $c$. Since $\beta \mathbf{w} - \beta' \mathbf{w}$ is not a constant vector, we have $\boldsymbol{\psi}(\beta, \mathbf{w}) \neq \boldsymbol{\psi}(\beta', \mathbf{w})$.
Since $\delta = \delta'$ implies $\boldsymbol{\upsilon}(\delta) = \boldsymbol{\upsilon}(\delta')$, and $\boldsymbol{\psi} \neq \boldsymbol{\psi}'$, the expected utilities differ:
$$
\eta^{(u)}(\mathbf{w}) = \boldsymbol{\psi}^\top \boldsymbol{\upsilon} \neq (\boldsymbol{\psi}')^\top \boldsymbol{\upsilon} = (\eta')^{(u)}(\mathbf{w})
$$
(The equality would require $\boldsymbol{\psi} - \boldsymbol{\psi}'$ to be orthogonal to $\boldsymbol{\upsilon}$, which fails except on a set of measure zero since $\boldsymbol{\upsilon}$ has distinct ordered components.)
Constructing an uncertain choice problem with this alternative yields different choice probabilities.
**Combining the cases:** For any $\theta \neq \theta'$, at least one of the three cases applies, guaranteeing the existence of a distinguishing decision problem. $\square$
### Non-Identifiability in Model m_0
In contrast, model m_0 faces a more challenging identification situation because the risky choice channel is absent.
::: {.callout-warning}
## Proposition (Identification Structure of m_0)
In model m_0 (uncertain choices only):
1. The sensitivity parameter $\alpha$ is globally identified.
2. The expected utility function $\eta^{(u)}: \mathbb{R}^D \to [0,1]$ is globally identified.
3. The individual components $\beta$ and $\delta$ are **weakly identified** through the composite mapping $(\beta, \delta) \mapsto \eta^{(u)}(\cdot)$. While generic point identification holds (distinct $(\beta, \delta)$ pairs produce distinct expected utility functions except on a set of measure zero), information about these parameters accumulates more slowly from data compared to m_1, resulting in wider posterior credible intervals.
:::
**Proof.** We establish each claim.
*Claim 1 ($\alpha$ is identified):* Consider any two uncertain alternatives with features $\mathbf{w}_1, \mathbf{w}_2$ yielding distinct expected utilities $\eta_1 \neq \eta_2$ (such alternatives exist generically). The choice probability is $\chi_1 = \sigma(\alpha(\eta_1 - \eta_2))$ where $\sigma$ is the logistic function. Since $\sigma$ is strictly monotonic and $(\eta_1 - \eta_2) \neq 0$, different values of $\alpha$ yield different choice probabilities.
*Claim 2 ($\eta^{(u)}(\cdot)$ is identified):* Suppose two parameter configurations $(\alpha, \beta, \delta)$ and $(\alpha', \beta', \delta')$ yield the same expected utility function: $\eta^{(u)}(\mathbf{w}; \beta, \delta) = \eta^{(u)}(\mathbf{w}; \beta', \delta')$ for all $\mathbf{w}$. Then for any uncertain choice problem with alternatives having features $\mathbf{w}_1, \ldots, \mathbf{w}_n$, the expected utilities are identical, and hence (given $\alpha = \alpha'$ from Claim 1) the choice probabilities are identical. Conversely, if $\eta^{(u)}(\mathbf{w}; \beta, \delta) \neq \eta^{(u)}(\mathbf{w}; \beta', \delta')$ for some $\mathbf{w}$, we can construct a binary choice problem distinguishing the two configurations (by the same argument as in the m_1 proof).
*Claim 3 (weak identification of components):* We analyze the information structure for identifying $\beta$ and $\delta$ separately. Consider $K = 3$ with utility vector $\boldsymbol{\upsilon} = (0, \delta_1, 1)$. The expected utility is:
$$
\eta^{(u)}(\mathbf{w}) = \psi_2(\beta, \mathbf{w}) \cdot \delta_1 + \psi_3(\beta, \mathbf{w})
$$
where $\psi_2 + \psi_3 = 1 - \psi_1$. The softmax probabilities $\psi_k$ depend on $\beta$ only through the differences $\boldsymbol{\beta}_k - \boldsymbol{\beta}_1$ (the softmax is invariant to adding a constant to all logits). Thus $\beta$ has a $(K-1) \times D = 2D$ dimensional "effective" parameter space for determining $\psi$.
The expected utility function $\eta^{(u)}: \mathbb{R}^D \to [0,1]$ is determined by how $\psi_2$ and $\psi_3$ vary with $\mathbf{w}$, weighted by $\delta_1$ and $1$ respectively. The question is whether distinct $(\beta, \delta)$ pairs can produce identical $\eta^{(u)}(\cdot)$ functions.
**Generic point identification holds:** For the equality $\psi_2(\beta, \mathbf{w}) \cdot \delta_1 + \psi_3(\beta, \mathbf{w}) = \psi_2(\beta', \mathbf{w}) \cdot \delta'_1 + \psi_3(\beta', \mathbf{w})$ to hold for all $\mathbf{w} \in \mathbb{R}^D$, we would need a precise functional relationship between $(\beta, \delta)$ and $(\beta', \delta')$ that holds across the entire feature space. This occurs only on a set of measure zero in the parameter space.
**Weak identification in practice:** While generic point identification holds, the *rate* at which information accumulates about $\beta$ and $\delta$ separately is slower than in m_1. In m_0, all information about $\delta$ must be extracted through uncertain choices where $\beta$ and $\delta$ interact multiplicatively in the expected utility. This interaction means that uncertainty about $\beta$ propagates to uncertainty about $\delta$ and vice versa, resulting in slower posterior concentration. $\square$
**Discussion.** The proposition clarifies what m_0 can and cannot identify:
- **What is strongly identified:** The sensitivity $\alpha$ and the expected utility function $\eta^{(u)}(\cdot)$ are fully determined by choice data. This means m_0 can predict choice behavior for any new uncertain alternative, given its features.
- **What is weakly identified:** The decomposition of $\eta^{(u)}$ into beliefs ($\psi$, via $\beta$) and utilities ($\upsilon$, via $\delta$) faces a more challenging inference problem. While distinct $(\beta, \delta)$ pairs generically produce distinct expected utility functions (point identification), the information geometry is unfavorable: $\beta$ and $\delta$ interact multiplicatively in the expected utility, so uncertainty in one parameter propagates to the other. In m_1, risky choices directly constrain $\delta$ through the likelihood term $\sum_n \log \chi^{(r)}_{n, z_n}$, which depends on $\delta$ but not $\beta$. This separation breaks the multiplicative coupling and allows faster learning.
- **Practical consequence:** The weak identification of $\delta$ in m_0 manifests as slower posterior concentration—wider credible intervals that shrink more slowly with sample size compared to m_1. This matches our empirical parameter recovery findings. Note that this is a *rate* phenomenon, not a fundamental non-identification: with sufficient data, m_0 posteriors do concentrate on the true values.
### Implications for Experimental Design
The identification results have direct implications for experimental design:
| Design Choice | Effect on Identification |
|---------------|-------------------------|
| Include risky choices with diverse lotteries | Enables direct identification of $\delta$ (and hence $\upsilon$) |
| Use spanning set of probability vectors | Ensures full utility function is identified, not just projections |
| Use spanning set of feature vectors | Ensures full $\beta$ matrix is identified |
| Include both simple (degenerate) and mixed lotteries | Provides both "anchoring" and "interpolation" information about utilities |
::: {.callout-note}
## Practical Guidance
For m_1, the spanning conditions are easily satisfied:
1. **Risky alternatives:** Include the $K$ "corner" lotteries $\mathbf{e}_1, \ldots, \mathbf{e}_K$ (certain outcomes) plus several interior lotteries. The corners alone provide $K$ linearly independent vectors.
2. **Uncertain alternatives:** Use feature vectors that vary across all $D$ dimensions. Random draws from a continuous distribution (as in our study design) satisfy this almost surely.
The key practical constraint is whether risky choice tasks are feasible and meaningful in the application domain.
:::
### Remarks on the Identification Results
The identification analysis clarifies why m_1 achieves better parameter recovery than m_0:
1. **Separation of constraints:** In m_1, risky choices provide constraints on $(\alpha, \delta)$ that are *independent* of $\beta$. This separation means that information about utilities accumulates directly from risky choice data, without needing to simultaneously infer the belief-formation process.
2. **The role of known probabilities:** The key advantage of risky alternatives is not merely that they add more data, but that their probability vectors are *given* rather than *inferred*. This eliminates one layer of the inference problem.
3. **Connection to classical results:** Our identification result for m_1 is related to—but distinct from—the Anscombe-Aumann representation theorem. That theorem establishes *when* preferences can be represented by subjective expected utility; our result establishes *when* the parameters of a specific SEU model can be recovered from choice data. The underlying intuition is similar: mixing objective and subjective probabilities provides complementary information.
## References